Ggplot2

Quantitative Methodology (UPF)

Jordi Mas Elias

https://www.jordimas.cat/

Summary

  • Layers
  • Row dplyr functions
  • Column dplyr functions
  • Transform variables

Warm up

R learning curve

UPF Inclusive Growth Index (IGI)

igi <- rendacs |>
  mutate(zs_gdp = (import_euros - mean(import_euros)) / sd(import_euros),
         zs_gini = (mean(index_gini) - index_gini) / sd(index_gini),
         upf_index = (zs_gdp + zs_gini) / 2) |> 
  select(nom_barri, secc = seccio_censal, zs_gdp:upf_index) |> 
  arrange(desc(upf_index))
head(igi, 10)
# A tibble: 10 × 5
   nom_barri                      secc  zs_gdp zs_gini upf_index
   <chr>                         <dbl>   <dbl>   <dbl>     <dbl>
 1 les Corts                        24  3.59   -0.492       1.55
 2 Sant Andreu                      28  0.718   2.28        1.50
 3 la Salut                         23  1.50    1.09        1.29
 4 les Corts                        34  1.37    1.19        1.28
 5 la Maternitat i Sant Ramon       46  1.56    0.959       1.26
 6 la Vila Olímpica del Poblenou    53  2.47    0.0426      1.26
 7 Sant Gervasi- la Bonanova        39  2.84   -0.416       1.21
 8 Provençals del Poblenou         104  0.204   2.21        1.21
 9 Canyelles                        63 -0.0334  2.44        1.20
10 les Tres Torres                  28  3.80   -1.46        1.17

UPF Inclusive Growth Index (IGI)

tail(igi, 10)
# A tibble: 10 × 5
   nom_barri                              secc zs_gdp zs_gini upf_index
   <chr>                                 <dbl>  <dbl>   <dbl>     <dbl>
 1 Sant Pere, Santa Caterina i la Ribera    51 -1.15    -1.54     -1.34
 2 el Raval                                  3 -1.27    -1.43     -1.35
 3 el Barri Gòtic                           30 -0.989   -1.74     -1.36
 4 Sant Pere, Santa Caterina i la Ribera    48 -1.36    -1.38     -1.37
 5 Sant Pere, Santa Caterina i la Ribera    47 -1.04    -1.71     -1.38
 6 el Putxet i el Farró                     81 -1.69    -1.15     -1.42
 7 el Raval                                  6 -1.17    -2.02     -1.59
 8 el Barri Gòtic                           25 -0.924   -2.43     -1.68
 9 el Barri Gòtic                           29 -1.30    -2.27     -1.79
10 el Barri Gòtic                           31 -1.08    -3.04     -2.06

Warm up

Paint the fence, first…

Warm up

…karate later.

Warm up

Data wrangling

Made with ggplot

Layers

Basic layers

Almost always, a ggplot consists of three layers1:

    1. Dataframe
    1. Aesthetics
    1. Geometry
df |> 
  ggplot(aes(aestethics)) +
  geometry()

Optional layers

Optionally, we add more layers, such as:

    1. Facet
    1. Coordinates
    1. Scale
    1. Theme
    1. Etc

Layered example

Example of a full-equipped plot.

bins |> 
  ggplot(aes(x = pvote, y = n, fill = type)) +
  geom_bar(stat = "identity", show.legend = F) + 
  geom_hline(yintercept = 0, size = 0.3) +
  scale_fill_manual(values = c("grey65", "grey35")) +
  facet_share(~type, dir = "h", scales = "free", reverse_num = TRUE) +
  coord_flip() +
  labs(x = NULL, fill = NULL, y = "Vote") +
  theme(panel.background = element_blank(),
        strip.background = element_blank(),
        strip.text = element_text(size = 16),
        text = element_text(size = 15),
        axis.line.x = element_line(size = 0.3),
        axis.title.x = element_text(vjust=125, size = 14))

Aesthetics

Cartesian coordinates

  • x and y.

Aesthetics

Inside aes(), what is represented by a variable:

rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point()

  • x: var horizontal axis.
  • y: var vertical axis.
  • col: geometry color.

Aesthetics vs. attributes

Aesthetics represent a variable. Always within the aes() function. E.g.:

  • x = gdp
  • col = continent

Attributes represent characteristics of geometry. Outside the aes(), normally in the geom_xxx() function. E.g.:

  • col = "red"
  • size = 2

Aesthetics vs. attributes

Color as aesthetic

rendacs |> 
  ggplot(aes(x = import_euros, 
             y = index_gini, 
             col = nom_districte)) +
  geom_point()

Color as attribute

rendacs |> 
  ggplot(aes(x = import_euros, 
             y = index_gini)) +
  geom_point(col = "red")

Aesthetics vs. attributes

Types of attributes:

  • size: size of the geometry.
  • alpha: transparency.
  • labels: names.
  • fill: For bars, polygons, and things to be painted.
  • shape: Mostly for points.
  • linetype: For lines.

Geometries

Geometries

  • geom_bar()
  • geom_col()
  • geom_point()
  • geom_boxplot()
  • geom_smooth()
  • … about 35 geometries!

One categoric variable

Bar plot (I)

Counts, etc.

One numeric variable

Histogram

Density plot

Dot plot

Semi-continuous, few cases

Extra: fill.

Two categoric variables

Bar plot (II)

One categoric, one numeric

Bar plot (III)

Geom_col

One numeric variable

Boxplot (I)

One numeric variable

Boxplot (II)

One numeric variable

Violin plot (I)

One numeric variable

Violin plot (II)

Geometries. Numeric in time

One numeric variable in time

Line plot (I)

One numeric variable in time

Line plot (II)

One numeric variable in time

Path

https://www.jordimas.cat/courses/fiiei_cat/socioeconomics/fonts_indicadors_cat_kuznets/#fig:kuznets-usa

One numeric with values of categoric

Bars (I)

One categorical variable (count)

Bars (II)

One categorical variable

Other layers

Facet

Different plots

Advanced

  • Put the aesthetics in the geom function.